Can we predict the outcome of a League of Legends game based on the first 10 minutes?

Semester 4 personal challenge 2022

Made by Megin van Herk

irelia-gif-lol.gif

Table of contents:


Domain understanding

League of Legends is one of the world's most popular video games, developed by Riot Games. It features a team-based competitive game mode based on strategy and outplaying opponents. Players work with their team to break the enemy Nexus before the enemy team breaks theirs.

yklpbH.gif

A lot of different statistics go into a single game of league. Some seem to have more impact than others. People assume that if one team has more kills than the other, that team will win. But there is also objectives, such as turrets, monsters and minions. The amount of data actually impacting the winning of a game might surprise you.

This game also has a lovely option for you to surrender after 15 minutes, this usually happens when you know you’re not going to win anymore. Either the enemy has killed you a lot and you’re not having fun anymore or you just don’t think you can win this game. but of course, you never know if you can still turn the tides. There is always a chance that later in the game something might change and you will win the game. this is a gamble some people are not willing to make. “never surrender” is a commonly used term. but what if you know if you’re going to win or not. What if you had a program that showed you after a 10-minute duration of a game, what the chances are of you winning or not? This could save a lot of people a lot of time. and or motivate people to keep playing the game because they know their chances of winning


Very short summary of interview:
“Do you surrender your league of legends games a lot?”

50% of games are surrendered

“How often do you find yourself thinking you’re going to lose the game but end up winning anyway?”

most of the time we have surrendered by then so not very often. if we don’t think were going to win, we can try to keep winning but it will take a lot of effort so rather surrender.

“if you were to know you’re percentage’s chance of being able to win the game, what kind of impact would this have on your surrender choice? for ex. 10 minutes into the game, a program would tell you that you have a 30% of winning. “

no effect, every minion can change the course of the game.

it would set people off to surrender more often

or give people a false sense of hope.

I would not download this

“what if the application would tell you what to improve to win the game. ex. "If you get 3 more kills you chances of winning are 70%"”

this would motivate me to keep playing the game

I would download this

(For full manuscript view assignment "Investigative Problem Solving")


Conclusion of this interview is that the simple "Will you win or not?" question is too vague for the users, and they need a more specific reason. This will help them improve their skills.

Why is this research needed?

When you think you suck at a video game. you just keep losing, cant seem to win. it might motivate you just to quit. But what if you knew exactly what you needed to do to be able to win. That is why this research is needed. to motivate people to keep playing league of legends. Improve themselves. Improve the game.

What is the goal of the project?

The goal of the project is to help players improve themselves. Based on their own and other players games. Showing during or after a game a section in which a player can improve or a section they have mastered, can really motivate someone to do better and keep playing.

Who is going to use our product?

League player looking to improve their winrate, is my target user. It would be a best fit for beginner or intermediate players. Since well experienced people dont need to know their winrate, and since the accuracy of the program isnt 100%, experienced players probably know better themselves. Such as in the esports scheme, they have their own knowledge and coaches, so this project would not be meant for them.

How will we reach the end product for the client?

It could be possible to make an app using this data, making an ingame display to keep the user up to date with their stats. And their chances of winning. Based on the interview, its not enough to just show the win percentage. Its better to dive in deeper in to the stats and show the user "Why" they are winning or losing.

Data sourcing

The data is acquired from the official riot games API (the company behind league of legends) and therefore is accurate and reliable.

https://developer.riotgames.com/apis#account-v1

About the dataset

This dataset contains info on the first 10 minutes of over 50k games. It contains all values that impact a game:

Analytic approach

The type of problem, at first sight, is classification, “Will I win or lose?”. A regression algorithm could fit as well, asking questions like “how many kills do I need to win?”, “how much am I allowed to die to not lose?”. we will try both to see which one works best.


Approach 1:

Data requirements

The type of problem, at first sight, is classification, “Will I win or lose?”. A regression algorithm could fit as well, asking questions like “how many kills do I need to win?”, “how much am I allowed to die to not lose?”. we will try both to see which one works best.

our target variable is if we win or lose, this variable is stored in both “blueWins” and “redWins”. if blue wins the game it will be “blueWins=1, redWins = 0”. You need only use 1, if you know blueWins= 0 then you can presume redWins = 1. so we only use “blueWins” as our target variable.

Later on, we look if choosing “redWins” and all attached data, makes a difference.

Data collection

Since this dataset includes both red team and blue teams data, its double up, because if blue loses the game, red automatically wins the game. so we dont need both the teams data. for now we will be focusing on the blue teams data.

Data understanding

To better understand the dataset, it is good to plot some graphs to visualize everything a bit better for your own understanding. When we dont use normalized data we cant actually read any of the values properly. when normalized you can see the clear lines connection, when not normalized its just one big blob. Putting everything on the same scale gives a better overview.

This parralel coordinate gives you allot of information. for example, you can see if you get more kills you have less deaths, and ofcourse if you have less kills you have more deaths, and allot more loss(o) when you have more deaths

In this plot you can clearly see a divide between win and lose. With these collumns its more beneficial to have "more", such as more gold, more levels, to win the game.

in this scatterplot it tells us wards placed(Vision over the map) and kills compared. in the game it makes sense that if you can see the enemy you can kill the enemy more. this scatterplot also states that, as you can see the lesser wards the more blue(o) losses there are. but its not completely accurate, even if they placed allot of wards you can still lose. But will you die more when having less visions? the plot next to this one shows deaths and wards. The more wins is on the left side of the scatter plot when they have lesser deaths, which makes sense. it seems to not matter how many wards a team gets for them to win or not. You can see the clear divison between winning and losing but its based manily on kills and deaths, wards doesnt seem to have an impact on this. Deaths and kills have a much higher impact on the outcome of the game. So we will focus on those.

Here you can clearly see the line between win and los, exluding the outlier ofcourse. The more deaths and less kills the more likely you are to loose. and the more kills and less deaths the more likely you are to win.

I think these variables are best to perform our machine learning on, since they seem to impact the game the most. ofcourse we shouldnt disregard all the other variables, so below is a pairplot with all the variables to see which other ones might be good to use aswell.

Data preparation

Preparation

Modelling

Approach 2

logistic regression

Data requirements

From the data, we can see that the outcome of a game is binary: the winner is denoted by either a 0 (blue team) or a 1 (red team). Using the data from different aspects of the game, we want to be able to predict who will win the game.

Because of the binary nature of the outcome we want to predict, we will use logistic regression to produce an output that will always be in between 0 and 1, and will allow it to correctly predict who will win.

Data collection

We will use all columns, excluding gameID and blueWins, blueWins is our target variable.

Data understanding

Data preparation

Logistic regression

Logistic regression is a statistical method for predicting binary classes. The outcome or target variable is dichotomous in nature. Dichotomous means there are only two possible classes. In this case our "blueWins" is either 1 or 0.

It is a special case of linear regression where the target variable is categorical in nature. It uses a log of odds as the dependent variable. Logistic Regression predicts the probability of occurrence of a binary event utilizing a logit function.


Trying out different variables and their accuracy scores:

We shall use the columns with the highest score together in our model.


Modelling

The columns we chose for this model all affect eachother, and impact the game allot.

We split the data into test and train

and fit the data to logistic regression

Evaluation

All the scores are similar to one another. Recall has the highest with a 0.74. High recall means that the algorithm returns most of the relevant results. So the range of our accuracy is between 0.66 to 0.74 but generally 0.7 on average. Logistic regression is the highest algorithm accruary score so far. in context to our domain, we can predict if youre going to win your game in the first 10 minutes with an accuracy of 70%

AUC

AUC is the area under the ROC curve. AUC ROC indicates how well the probabilities from the positive classes are separated from the negative classes

Approach 3:

SVMs with sigmoid kerne

• Support vectors are the data points that lie closest to the decision surface (or hyperplane) • They are the data points most difficult to classify • They have direct bearing

SVMs maximize the margin (Winston terminology: the ‘street’) around the separating hyperplane. • The decision function is fully specified by a (usually very small) subset of training samples, the support vectors. • This becomes a Quadratic programming problem that is easy to solve by standard methods

info gathered from: https://www.javatpoint.com/machine-learning-support-vector-machine-algorithm

Data understanding

The above graph shows that our data is equally distributed, the same amount of losses as there are wins.

Here we see that the more kills you have, the higher chance of you winning the game.

here we see that the more deaths you have, the higher chance of you losing

In this graph you can see that most amount of kills have both wins and loss, but at some point, when a high number of kills is reached, the team stops losing the game.

Preprocessing

Data preprocessing is used in both database-driven and rules-based applications. In machine learning (ML) processes, data preprocessing is critical for ensuring large datasets are formatted in such a way that the data they contain can be interpreted and parsed by learning algorithms

I chose the columns 'blueKills' and 'redKills', because in my plotting of the data, those are the most impactful variables.

The object of StandardScaler class for independent variables or features. We will fit and transform the training dataset.

Modelling

Kernel methods are a class of algorithms for pattern analysis or recognition, whose best known element is the support vector machine (SVM). The general task of pattern analysis is to find and study general types of relations (such as clusters, rankings, principal components, correlations, classifications) in general types of data

Choosing the most appropriate kernel highly depends on the problem at hand.

Sigmoid kernel

It is mostly preferred for neural networks. So binary, which in our case, 'blueWins' is either 1 or 0, so binary applies. The sigmoid function takes any real-valued input and maps it to a real number in the range (0, 1).

We can think of this almost like saying

“if the value we map to output near 1, the blue team wins if it maps to output near 0, the blue team loses”.

Evaluation

So the range of our accuracy is between 0.59 to 0.61 but generally 0.6 on average. This means we can predict the outcome of a league of legends game with a 60% accuracy

F1 Score:

You can calculate the F1 score for binary prediction problems

Caveats

The main problem with the F1 score is that it gives equal weight to precision and recall. We might sometimes need to include domain knowledge in our evaluation where we want to have more recall or more precision.

Since the difference between the precision and recall is very small, this score is not much different.


Collecting

Nereast neighbor

testing if blueKills and blueDeaths can predict if blue Wins or not. Accuracy of 0.6502024291497975

logistic regression

testing if blueKills, redKills, blueDeaths and redDeaths can predict if blue wins.

Accuracy 0.698739953823133

KNN and logistic regression use similiar data, the logistic regression has the higher accuracy rate, so this one would be the more suitable one.

SVM

testing bluekills and redkills predict bluewins accuracy of 0.5940912010276173


this one has the lowest accuracy, so I wouldnt use this one.

Conclusion

conclusion that summarises the reasoning and final outcome of the work, and why the selected solution is considered the "most effective" in the context of the domain.

Based on all the exploring of the data in this document. We can make some conlusions. League of Legends is a big game, with allot of factors at play. All the machine learning algorithms come close to the 50/50 winrate that is known. The highest accucary score we have is logistic regression, with a score of 0.70.

So it is possible to predict the outcome of a game within 10 minutes, but only with an accucary of 70%, wich cannot always be trusted. 100% is not possible in this domain, or atleast not at the 10 minute mark. There will always be a chance for the tables to turn.